Emergence of large cliques in random scale-free networks 
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In a network cliques are fully connected subgraphs that reveal which are the tight communities 
present in it. Cliques of size c > 3 are present in random Erdos and Renyi graphs only in the 
limit of diverging average connectivity. Starting from the finding that real scale free graphs have 
large cliques, we study the clique number in uncorrelated scale-free networks finding both upper 
and lower bounds. Interesting we find that in scale-free networks large cliques appear also when 
the average degree is finite, i.e. even for networks with power-law degree distribution exponents 
7 G (2, 3). Moreover as long as 7 < 3 scale-free networks have a maximal clique which diverges with 
the system size. 
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Scale-free graphs have been recently found to encode 
the complex structure of many different systems ranging 
from the Internet to theprotein interaction networks of 
various organisms [l|, y{ . This topology is clearly well 
distinguished from the Erdos and Renyi (ER) j^J ran- 
dom graphs in which every couple of nodes have the same 
probability p to be linked. In fact while scale-free graphs 
have a power-law degree distribution P(fc) ^ k~'^ and a 
diverging second moment (/c^) when 7 < 3, ER graphs 
have a Poisson degree distribution and consequently fi- 
nite fluctuations of the nodes degrees. The degree distri- 
bution strongly affects the statistical properties of pro- 
cesses defined on the graph. For example, percolation 
and epidemic spreading which have very different phe- 
nomenology when defined on a ER graph or on a scale- 
free graphs H, 13 . 

The occurrence of a skewed degree distribution has also 
striking consequences regarding the frequency of partic- 
ular subgraphs present in the network. For example, ER 
graphs with finite average connectivity have a finite num- 
ber of finite loops 0,0- On the contrary scale- free graphs 
have a number of finite loops which increases with the 
number N of vertices, provided that 7 < 3 0,13 • The 
abundance of some subgraphs of small size - the so-called 
motifs - in biological networks has been shown to be re- 
lated to important functional properties selected by evo- 
lution [lol ITlL H^ . Among subgraphs, cliques play an 
important role. A clique of size c is a complete subgraph 
of c nodes, i.e. a subset of c nodes each of which is linked 
to any other. The maximal size Cmax of a clique in a 
graph is called the clique number. Finding the clique 
number of a generic network is an NP-complete prob- 
lem TJl, even though it is relatively easy to find upper 
(c+) and lower (c_) bounds The clique number also 
provides a lower bound for the chromatic number of a 
graph, i.e. the minimal number of colors needed to color 
the graph 0. Finally, cliques and overlapping succes- 
sion of cliques have been recently used to characterize the 
community structure of networks [T^ ITtI I . 

In ER graphs it is very easy to show that cliques of 
size 3 < c ^ A'^ appear in the graph only when the av- 
erage degree diverges as (fc) ^ A^^^ with A^ P|. On the 



other hand, real scale free networks, such as the Internet 
at the autonomous system level, contain cliques of size 
much larger than c — i. For example. Fig. ^ reports 
upper and lower bounds c+,c_ |l4| for the size of the 
maximal clique of the Internet and protein interaction 
networks of c.elegans and yeast This shows that 

scale-free networks can have large cliques and that the 
clique number of the Internet graphs increase with the 
network size A^. 

Is the presence of such large cliques a peculiar prop- 
erty of how these networks are wired or is this a typical 
property of networks with such a broad distribution of 
degrees? This letter addresses this question and shows 
that scale free random networks do indeed contain cliques 
of size much larger than c = 3. We shall do this by com- 
puting the first two moments of the number Mc of cliques 
of size c in a network of A^ nodes. These provide upper 
and lower bounds for the probability P{Nc > 0) of find- 
ing cliques of size c in a network through the inequalities 



i 



< P{Nc > 0) < {Nc). 



(1) 



Here and in the following the notation (...) will be used 
for statistical averages. Eq. |^ in turn provide upper 
and lower bounds for the clique number c < Cmax < c: 
Indeed if {Nc) for c > c as A^ — > 00, we can conclude 
that no clique of size larger than c can be found. Likewise 
if for c = c the ratio (A/'c)^/ {-N"^) stays finite, then cliques 
of size c < c can be found in the network with at least a 
finite probability. The results indicate that the finding in 
Fig. n are expected, given the scale free nature of these 
graphs. Our predictions are summarized in Table We 
find that the ER result Cmax = 3 extends to random scale 
free networks with 7 > 3 whereas for 7 < 3 the clique 
number Cmax diverges with the network size A^ in a way 
which is extremely sensitive of the degree distribution of 
mostly connected nodes, i.e. to the precise definition of 
the cutoff. 

The results of Tabled are derived for the hidden vari- 
able ensemble proposed in Ref . 0, |23| , where the link 
probability p between two nodes is replaced by a func- 
tion r{qi,qj) which depends on the fitness qi and qj of 
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FIG. 1: The lower bound C-_ (filled symbols) and the up- 
per bound c+ (empty symbols) of the clique number of the 
Internet graphs(circles) and the protein interaction networks 
of e.coli and yeast (triangles) jl8j are shown as a function 
of the network size A*'. The lines (null hypothesis on Inter- 
net data) and the triangles pointing down (null hypothesis 
on protein interaction networks) indicates the upper bound 
(dashed line and empty symbols) and the lower bound (solid 
line and filled symbols) computed from Eq. ^ for random 
graphs constructed with the same properties of the considered 
real graphs. 



the end nodes i and j. Apart from its close relation with 
the ER ensemble, this choice is also convenient because 
it allows for a simple generalization of the results to net- 
works with a correlated degree distribution Quite 
similar results can be derived for the MoUoy-Reed en- 
semble |23| with the same approach (provided a cutoff is 
chosen appropriately to avoid double links among mostly 
connected nodes) . Other ensembles, such as that of Ref. 
|22l | instead implicitly introduce a degree correlations for 
highly connected nodes and therefore require a different 
approach ,23.] . Given the extreme sensitivity of the clique 
number on details of the cutoff of the degree distribution, 
we also expect quite different results. 

Hidden variable network ensemble As in Ref. !l9j we 
generate a realization of a scale-free networks by the fol- 
lowing procedure: i) assign to each node i of the graph 
a hidden continuous variable qi distributed according a 
p{q) distribution. Then ii) each pair of nodes with hid- 
den variables g, q' are linked with probability r(g, q'). For 
random scale-free networks with uncorrelated degree dis- 
tribution, we take p{q) = paq^^ for q G [m, Q] and 



r{q,q') = 



{q)N- 



(2) 



The average degree (k) — (q) is equal to the average 
fitness, and it diverges as iV ^ oo for 7 < 2. Likewise, 
the degree hi of node i follows a Poisson distribution 
with average qi. Notice that a cutoff is needed in p{q) to 
keep the linking probability r{q,q') smaller than one. In 
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^ - 7 llog{l-£)| 
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TABLE I: Scaling of the theoretically estimated upper and 
lower bound of the clique number of random scale-free net- 
works with different exponents 7 of the degree distribution. 
The precise definitions of c and c together with the expression 
for the constants b, b' are given in the text. 



particular, we will take require 



(3) 



so that r((5, Q) — 1 — e. For 7 > 3, values oi qi k, Q will 
never occur, as the maximal qi ~ iVi/(7-i) ^ Q We 
shall see that this is immaterial for the clique number, 
however. Instead, for 7 < 2, (q) diverges with the cutoff, 
and hence Q ~ N^^'^ . 

Average number of cliques. A clique of size c is a set 
of c distinct nodes C — {ii, . . . , ic}, each one connected 
with all the others. For each choice of the nodes, the 
probability that they are connected in a clique is 



(4) 



where we used Eq. 0. Fixing a small fitness interval 
Aq, let n{q) be the number of nodes i & C with fitness 
Qi € (g)? + ^q)- The number of ways in which we can 
pick c nodes in the network with n{q) nodes with fitness 
q can be expressed by combinatorial factors. Hence, with 
the shorthand Q = q/^/{q)N, 



{"(«)} 9 



c-l)n(g) 



(5) 



where the sum is extended to all the sequences {n{q)} 
satisfying n{q) = c. Introducing such constraint by a 
delta function, we can perform the resulting integral by 
saddle point method, i.e. 



duj 
2^' 



y/2TTN\f"{y*)\ 



(6) 



where f{y) = j^y + (log [l -I- Q'^ ^e ^]), and we have 
taken the hmit Aq 0. In Eq. 10 y* is fixed by the 
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saddle point condition 
c 

N " 
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'-e-y 



(7) 



We present here an asymptotic estimate of (JVc). Slightly 
more refined arguments, which do not add much to the 
understanding given here, can be used to derive an upper 
bound |2^. In the limit N ^ oo, the left hand side of 
Eq. JTJ is small, hence to a good approximation c w 
N{Q'=-^)e-y' Inserting this in Eq. © we find 



Ne{Q'= 



(8) 



Therefore, in order to have (Mc) ^ it is sufficient to 
take c > c, where c is the solution of 



(9) 



We consider now separately the case of scale-free net- 
works with different exponents 7 of the degree distribu- 
tion. 

• Networks with 7 > 3 

Eq. lO has no solution for c > 7. Indeed 

NiQ"-'^) ~ 7V(3-t)/2 ^ in this range. For c < 7, 

the integral in (Q"^^^) is no longer dominated by 

the upper cutoff, and it is hence finite. Therefore 

Ar(Qc-i) ^ N'^3-c)/2 ^^^^Yi implies that c = 3. It 

is easy to see that this conclusion holds also if we 

1 

take the natural cutoff Q = aN t-^ . 

• Network with 2 < 7 < 3 
Using Eq. ||2Jl, Eq. ® becomes 



c(c - 7) 
(1 - e)s-o 



57V(3-7)/2 



(10) 



for 6 = (7 - l)m(T-i'e(q}(i-'')/2^ The solution de- 
pends crucially on whether e = or not. In the 
former case c ~ 7V(3-7)/4 increases as a power law 
of the system size, whereas for e > it increases 
only as logA^/log(l — e), as detailed in TableQ] 

• Network with 1 < 7 < 2 
Taking into account the divergence of (g) and Q ^ 
iV^/T, Eq. ® becomes 



c(c - 7) 
(1 - e)e-n 



6'iVi/7 



(11) 



with b' ^ {(7 - l)[m(2 - 7)](T-i)}i/7 . Again, 
for e = and e > we find different results, c ^ 
jSl^/C^i) and c ~ logiV/log(l — e) respectively (see 
Table HI). 



Second moment of the average number of cliques. 
When computing the average number of some particu- 
lar subgraphs in a random network ensemble the result 
might be dominated be extremely rare graphs with an 
anomalously large number of such subgraphs. In this 
cases, the average number of a subgraph does not pro- 
vide a reliable indication of its value. In order to have 
more insight on the characteristics of typical networks we 
use the classical relation Eq. Q of probability theory 
which provides a lower bound for the probability that a 
typical graph contains at least one clique of size c. This 
requires us to compute the second moment (Af^) of the 
number of cliques of size c in the random graph ensem- 
ble. In order to do this calculation we are going to count 
the average number of pairs of cliques of size c present 
in the graph with an overlap of o = 0, . . . , c nodes. We 
use the notation {n{q)} to indicate the number of the 
nodes with fitness q belonging to the first clique, {no{q)} 
to indicate the number of nodes belonging to the overlap 
and {n'{q)} to indicate the number of nodes belonging 
to the second clique but not to the overlap. We con- 
sider only sequences {n{q)} , {n' (q)} , {no{q)} which sat- 
isfy J^q^i^l) = C, J^q^oiq) = O and J^qlT-'i^l) = C - O. 

With these conditions, following the same steps as for 
(TVc) we get 

(-^c) ^ Y.J'^y J dy" J dy'e'^''fiy'y''y°-^)^ (12) 



0=0 



where 



/(2/,y',y°,Q) = ^[2/c + 2/'(c-o) 



log 



1 -I- (e-y' + e-y^ Q"-^ + e-'^y+y"^ Q^" 



(13) 



The evaluation of this integral by saddle point is 
straightforward. The key idea is that, in order to have 
(A/'c ) of the same order as (A/'c)^ one needs to require 
that the sum is dominated by configurations with non- 
overlapping cliques (o ~ 0). Using the estimate of {Nc) 
derived above and the definition of c, for 7 < 3 we arrive 
at 



P{Mc > 0) > 



> 



c(c-7)(l-e)'"-°'e 



c(c-7) 



(14) 



The lower bound for the clique number will depend on 
e and c. 

In the case e = lets define the clique size c satisfying 



c c - 



7)e 



c(c - 7) 



(15) 



I.e. c 



f:2/3 



From Eq. H14|l and the definition of c it 
follows that as iV, c ^ 00 the probability to have at least 
a clique of size c = c is finite, i.e. 



P{Mc > 0) > 



(16) 
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Instead in the case e > for any a > the r.h.s. of Eq. 
f(T^ is very close to 1 for and chque sizes c = (1 — a)c 
and c 3> l/(ae),i.e. 

P{Mc > 0) ^ 1. (17) 

This imphes that for e > the fower bound is very close 
to the upper bound c = (1 — a)c for very large networks. 

Conclusions In conclusion we have calculated upper 
and lower bounds for the maximal clique size Cmax in un- 
correlated scale-free network, showing that Cmax diverges 
with the network size iV as long as 7 < 3. In particu- 
lar large cliques are present in scale-free networks with 
7 e (2,3) and finite average degree. It is suggestive to 
put the emergence of large cliques for 7 < 3 in relation 
with the persistence up to zero temperature of long range 
order in spin models defined on these graphs 24] . These 
results were derived within the hidden variable ensemble 
[T^ I20I , but the same method can be extended to other 
ensembles including those with a correlated de- 

grees. 

In Fig. n ■we compare the upper and lower bounds 
derived here for random scale-free graphs with the esti- 
mated clique number of real networks. These networks 
have many nodes with degree larger than that of the 
structural cutoff. Networks with such highly connected 



nodes cannot be considered as uncorrelated. The best 
approximation, within the class of uncorrelated networks 
discussed here, is provided by those with maximal cutoff 
(e = 0). The bounds of Fig. ^have been derived from 
Eq. ^ and IjlSfl . assuming a random network with i) 
an exponent 7 as measured from real data ii) the same 
number of nodes and links (i.e. the same average degree) 
and in) a structural cutoff given by Eq. ijjjl with e = 0. 
Also notice that e = yields the least stringent bounds. 

Fig. n shows that generally the largest clique size 
Cmax of real networks falls well within our bounds. Of 
course, accounting for the presence of correlations in the 
degree of highly connected nodes in these networks may 
provide more precise estimates. We saw that our esti- 
mates are very sensitive to the tails of the degree dis- 
tribution and we expect it to depend also strongly on 
the nature of degree correlations. Preliminary results, 
extending the present calculation to correlated networks 
[2^ where r{q, q') = 1 — e~°"^'^ with the natural cutoff 
Q ~ ]\j'^/i'y-^') ^ indicates that the clique number can take 
values a factor two bigger than in real data |2^. These 
preliminary results underline the importance of extend- 
ing this approach to correlated networks. 
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